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^_ This paper provides further insight into the key concept of miss- 

-si , ing at random (MAR) in incomplete data analysis. Following the 

usual selection modelling approach we envisage two models with sep- 
arable parameters: a model for the response of interest and a model 
pH , for the missing data mechanism (MDM). If the response model is 

^0 ' given by a complete density family, then frequentist inference from 

the likelihood function ignoring the MDM is valid if and only if the 
MDM is MAR. This necessary and sufficient condition also holds 
more generally for models for coarse data, such as censoring. Ex- 
amples are given to show the necessity of the completeness of the 
underlying model for this equivalence to hold. 
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1. Introduction. A full parametric model for missing data comprises two 
components: one is for the complete data and the other is for the missing 
■^ ■ data mechanism (MDM). The former describes the probability distribution 

^O I that governs the data generation process of interest, while the latter char- 

~l ' acterizes the observation process by which some data may be missing. The 

(^ . parameterizations of these two processes are often assumed to be separable, 

and our target is to make inference about the parameters involved in the 
complete data model using only the available incomplete data. 

a In practice, modelling incomplete data is a very difficult task since in 

. . I most cases the incomplete data themselves contain little or no information 

^ ■ about the MDM. The fundamental and most widely used assumption about 

the MDM is that it is a missing at random (MAR) model [Rubin (1976)]. 
^ ' The basic idea is that the probability that a response variable is observed 

can depend only on the values of those other variables which have been 
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observed. This concept has been extensively studied, and effective computa- 
tional methods for handling missing data under the MAR assumption have 
been well developed, for example, using the EM algorithm. Good references 
include Tanner (1993), Schafer (1997), Kenward and Molenberghs (1998) 
and Little and Rubin (2002) among many others. 

A closely related, but logically distinct, concept is ignorability. The basic 
idea here is that inference based on the joint specification of both complete 
data and MDM models is the same as the inference we would obtain if we 
used the complete data model only, simply integrating out the values of any 
variables which are missing. It is well known that MAR, together with the 
assumption of separable parameters, is a sufficient condition for ignorability 
of the MDM in likelihood based inference. It is, however, not a necessary 
condition. 

Although these concepts have been widely discussed, there have been some 
inconsistencies between different authors on how they are defined and inter- 
preted, and in the choice of terminology. The Weblist impute@utdallas.edu 
gives an interesting summary of views. We avoid ambiguities by giving some 
precise definitions in Section 2. 

In Section 3 we show that for models given by a complete family of dis- 
tributions, MAR is both necessary and sufficient for ignorability. The result 
depends on a heritable property of completeness: that, with suitable repa- 
rameterizations, completeness of a multivariate distribution implies com- 
pleteness of all conditional and marginal distributions. Examples are given 
to show that, for inference in a family of distributions which is not complete, 
an MDM can be ignorable without being MAR. 

This necessary and sufficient condition is extended in Section 4 to the 
wider concept of coarsening at random introduced by Heitjan and Rubin 
(1991). Here, ideas for missing data are generalized to other kinds of incom- 
plete data such as censoring or rounding. 

Section 5 offers some concluding remarks. 

2. Missing data and likelihood ignorability. Let Y = (Yi, . . . , Y^)^ be a 
A:-dimensional random vector with probability density function /(y; 6) on 
3^ C M , where ^ € is a d-dimensional parameter of interest. Suppose that 
the observation process of Y suffers from missing data and hence, associated 
with Y there is also a binary random vector R = {Ri, . . . , Rk)^ indicating the 
observational status of Y, where Ri takes the value when the observation 
of Yi is missing and the value 1 when Yi is observed, i = 1, . . . ,k. Denote the 
range of R by 

7^={(rl,...,rfc):ri = 0or 1, i = l, . . . ,k} = {0,1}\ 

We assume that the parameterization of the joint distribution of Y and 
R can be put into the selection model form 

(2.1) f{y,r;e,i^) = f{y;e)f{r\y,i;), {d,tP)eex^, 
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in which the parameters 9 and ip are assumed to be distinct [Rubin (1976)]. 
The conditional density f{r\y,'il)) characterizes the probabihstic relation be- 
tween the data-observation process and the values of the data themselves 
and hence specifies a model for the MDM. The joint distribution of Y and R 
can also be written in the pattern mixture form [Little (1994)] in which we 
model instead the conditional distribution of Y given i?, but the parameter- 
ization in (2.1) makes the MAR condition more transparent for the present 
discussion. 

The pair of random variables (y, R) induces an observable random vari- 
able Z. Using a notation analogous to that for the coarsening function as 
defined in Heitjan and Rubin (1991), Z is 

Z = Z{Y,R) = {Zu...,Zkf, 
(2.2) 

whereZ, = |^^ if /?, = 0, ^ = 1'---'^- 

For notational convenience we allow the symbol M to appear in any position 
in the vector argument of a multivariate density function, using it to denote 
the marginal density of the other variables. For example, suppose /(ti,i2) 
is a density on Ti x T2 C M^ and fi{ti), i = 1,2, are the marginal densities. 
Then we identify /(ii,M) with fi{ti) and /(]R,t2) with /2(t2)- Trivially, 
^) = 1. With this convention the density of Z can be expressed as 



(2.3) „ 

= /(?/^^'^^) / f{y''^-''^\y^'-'>;0)f{r\y;i^)dy^^-'\ 



where 1 is the A;-dimensional vector with all elements equal to 1, y^"^' and 
y(i-'') are, respectively, the observed subvector and the missing subvector 
of y given by 

y^''^ = {yi:ri = l, i = l,...,kf 

and 

y('-^) = (2/.:r. = 0, i = l,...,kf, 

and for each variable yi contained in yy'^~^> the integral in (2.3) is over its 
whole range. 

In the above setting Rubin's MAR condition [Rubin (1976)] can be ex- 
pressed as follows. 

Definition 2.1. A MDM is said to be MAR if the conditional distri- 
bution f{r\y■,^lJ) has the special form 

(2.4) f{r\y;ij) = hr{z{y,ry,^) for all (y, r) G 3^ x 7^, 

where, for any fixed ip and r, hr{-;ip) is a function mapping M'-"^ '"' into [0, 1]. 
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Note that the dimension of the space M^""^ ^' varies with the value of r, 
and hence hr{-;'>p) is a family of 2 functions indexed by the subscript r. 
Under MAR the MDM depends on (y,r) only through the function z{y,r), 
that is, through the observed part of the sample y. 

A MDM model is ignorable for inference about the parameter 9 if the 
inference based on the combination of both the complete data model and 
the MDM model coincide with the inference based on the complete data 
model alone. For likelihood based inference we assume that ignorability is 
an intrinsic property of the joint model /(y, r; 6, i/j) rather than a property of 
any specific sample realization. Thus we are interested in frequentist infer- 
ence from the likelihood function, rather than inference from the particular 
likelihood function we get from the observed sample. To emphasize this we 
use the term likelihood ignorable (LIG) in the following definition. 

Definition 2.2. A MDM is said to be LIG if the integral 

(2.5) //(y(^"^)|yM;^)/(r|y;V')dy(^-^) 



is free of 6 for almost all realizations of {y,r) (^y xTZ and for all {6,ip) € 

The contribution of observation z to the likelihood is the product of the 
two terms in the right-hand side of (2.3). LIG means that the second term 
[the integral over y'""""''] does not affect the likelihood as far as inference 
about 9 is concerned. Equivalently, the contribution of this second term of 
the log likelihood disappears when we differentiate with respect to 9. All 
that matters is the first term, which is just the marginal joint density of 
those components of Y which are actually observed. 

Notice that MAR is a property of the conditional distribution f{r\y■,^p), 
whereas LIG depends on both f{r\y]tp) and the response model f{y;9). 

Under the MAR model, 

/(y(i-^)|2/«;0)/(r|y;V')(iy(^-'') 

/(y(i-'-)|yM;0)/i,(z(y,r);V^)dy(i-^) 

= hr{z{y,r);'il;), 

which is independent of 9. Hence MAR is a sufficient condition for LIG. We 
seek the conditions under which MAR is also a necessary condition for LIG. 
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3. Necessary and sufficient conditions for LIG. In this section we show 
that if the family of density functions f{y;0) forms a complete class, then 
MAR is both necessary and sufficient for LIG. 

First some preliminaries about completeness. Recalling the elementary 
definition [e.g., Zacks (1971)], a family of probability density functions {/(y; 9) 
9 G 0} on 3^ C M is said to be complete if the identity 



Jt{y)f{y;9)dy = for ah ^ G 6 



implies that t{y) = for almost all y £y. 

Now let Y^^' be a subvector of the random vector Y, and let Y^'^' be the 
corresponding complementary subvector. Denote the sample spaces of Y^^' 
by y^^' , i = 1, 2. Then the joint family f{y; 9) can be decomposed into 

/(y;^) = /i(y^'^e)/2|i{y^'^|y^'^;^(y^'^)}, 

where 9{y^^') is a function: y^^'^ ^^ Q [see Arnold, Castillo and Sarabia (1999)]. 
We remark that, in general, even if the joint density family can be identified 
by the parameter 9, neither the marginal density family fi{y^^'',9) nor the 
conditional family f2\i{y^'^' |y i ^(y )} is identified by the same parameter. 
However, there is always a many-to-one function: (/>! : i-^ $1 C such that 

{/i(y«;<Ai):<AiG^i} = {/i(y(');^):eGe}, 

and the new parameter 4'i{9) is identified. Similarly, for any given y'^', the 
conditional family /2|i can be identified by (p2{9;y^^'). Detailed discussion of 
the problems of reparameterization and identification will in general call for a 
topological group structure in the parameter space 0, but for the purpose of 
describing completeness we merely borrow the form of the parameterization 
to make the representation clear. 

The following lemma says that completeness is a heritable property from 
the joint density family to its marginal and conditional density families. 

Lemma 3.1. Suppose that {f{y;9) :0 G 0} is a complete density family. 
Then the following hold: 

(a) the marginal family [f {y^^' ', 4>i{9)} : 9 ^ @] is complete; 

(b) for almost all y^^^ G 3^^^^ the conditional families [f2\i{y \y :(t'2i9; 
y^^')} : ^ G 0] are complete. 

See the Appendix for the proof of Lemma 3.1. 

Now we apply Lemma 3.1 to the conditional density family f{y^^~^''\y^^' ; 9) 
to yield the following theorem. 
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Theorem 3.1. For the selection model (2.1) assume that {f{y',0):6 (^ 
Q} is a complete family. Then the necessary and sufficient condition for LIG 
is that the MDM is MAR. 

Proof. We only need to verify that LIG implies MAR. From Defini- 
tion 2.2 LIG implies that the integral (2.5) is independent of 9 for any given 
r and almost all y''''\ Denoting its values by w{y^''' ,r;Tp), we have the equal- 
ity 

(3.1) / {f{r\y; V') - t^(yM, r; ij)}fiy^^-'^ \y^'^ ; 9) dy(^~^^ = 0. 



Because of the inheritance property of completeness, f{y^^^^'\y^^'\0) are 
complete density families for all r gTZ, and hence, for any values of r and 
-0, (3.1) implies 

/(r|7/;V^)='u;(2/W,r;V'). 

Thus the MDM f{r\y;ip) depends on y only through y'*"^ and so must have 
the form of hr{z{y,r);'tlj) in (2.4). That is, the MAR condition holds. D 

The following examples show that, for an incomplete density family, LIG 
does not guarantee MAR. 

Example 3.1. Consider the bivariate normal density family 
Clearly this is not a complete family since E{Yi — 2I2) = for all values of 

e. 

Suppose that Yi is always observed but Y2 may be missing. The MDM is 
then characterized by the functions 

h{ifi){y;ij), /i(i,i)(y;'0) = i-/i(i,o)(y;V'), V.o) = ^{0,1) = 0. 

The MAR condition demands that /i(i,i)(y;0) as a function of y = (2/1,2/2) 
is independent of 7/2 for all V' £ ^- However, in this example the conditional 
density of I2 given Yi is independent of 6. Hence, for r = (1,0) and any 
arbitrary function /i(i,o)(y;0)7 the integral 

' f{y^^-'■^\y'•'^■,9)f{r\y■,^P)dy'^^-^■^= J f{y2\yi)h^i,0){y;^)dy2 
does not depend on 6. Thus in this case any MDM is LIG. 
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Example 3.2. Now extend Example 3.1 by supposing that Yi or Y2, 
but not both, may be missing. Suppose that the MDM is 

^{i,i)(y;^)i when r= (1,1), 

f/H,/-,M-J ^i.o)(^'^)' when r = (1,0), 

/iny,W-<; h(o^i)iy2;i^), whenr = (0,l), 

h(o,Q)^0, when r = (0,0), 



where 



^h(^i,j)iy;ip) = '^ 



for all y and ip. Since /i(i,o) depends on both yi and y2, the MDM is not 
MAR. However, because ^(1,1), ^(0,1) ^-iid ^(0,0) satisfy the MAR condition, 
we only need to check the LIG condition for r = (1,0). However, this is just 
the same as Example 3.1, so the LIG condition holds. 

Example 3.3. Let Y = {Yi,Y2,. . . ,Ykf be i.i.d. N{e,l). Then S = 
■j: Yl,i=i ^ is a sufficient statistic and the vector of sample differences 

A = (Yi - y2, >2 - ^3, • • • , Yk-i - Ykf 

is an ancillary statistic for 9. These statistics are independent, so we have 

f{y-e) = f{y\s)f{s;e) = f{a)f{s;e). 

Similarly, for any given r with 'iJ'r < k — 1, we can define the corresponding 
statistics s,. and a,, for the subvector y^'^^^'. 
Now suppose that the MDM takes the form 

Clearly this is not MAR because hr depends on y'^^*") through a^-. However, 
'/(y(i-'-)|yM;e)/(r|y;V^)dy(i-'-) 

f{o-r)f{Sr',0)hr{ar,y,i^) dSr dUr 

f{ar)hr{ar,y^''^;'ilj)dar, 
which does not depend on 9. Hence this MDM is LIG. 
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4. Extension to coarsening at random. The coarse data model of Heitjan and Rubin 
(1991) is a more general way of describing incomplete data problems. Here 
Z, the observable outcome, is a measurable subset of the sample space, such 
as a half line (when a life time is known to exceed a censoring time) or a 
finite interval (when an observation is rounded). The notion of coarsening 
at random (CAR) was introduced by Heitjan and Rubin (1991) as a natural 
extension of MAR to coarse data and was further studied by Heitjan (1993, 
1994, 1997) and Jacobsen and Keiding (1995). 

Following Heitjan and Rubin (1991), a random variable G, the so-called 
coarsening variable, defines the measurable subset Z as Z = Z{Y, G). Equa- 
tion (2.2) is the special case of this when G = R. The conditional distribution 
of G given Y defines the coarsening data mechanism (CDM) 

where, again, the parameter ip is assumed to be distinct from the parameter 
6 in the main model f{y;9). 

With the CDM h{g,y;ip), the conditional distribution of Z given y can 
be expressed as 



i^{z,y;il^)= / h{g,y]'4))dg. 

J{g:Z{y,g)=z} 

For a rigorous expression for this conditional density in the case when G is 
continuous, see Jacobsen and Keiding (1995). 

The following definition of CAR is due to Heitjan and Rubin (1991). 

Definition 4.1. The CDM is CAR if, for any fixed observed subset z, 
and for each value of ip, k{z, y; tp) takes the same value for all y (z z. 

The likelihood function for 6 based on an observed z is proportional to 
the probability that {Y, G) falls in the set {{y, g) : Z{y, g) = z}, which can be 
written as 



f{y;0)h{g,y;il^)dgdy= / f{y;9)K{z,y;'il^)dy. 

z ■'{g- Z(y,g)=z} Jz 

This leads to the following definition. 

Definition 4.2. The CDM is said to be LIG if, as functions of 6, 
(4.1) / f{y- 9)k{z, y- ^l^)dy^ f f{y; 9) dy. 

J z J z 

The generalization of Theorem 3.1 is as follows. 

Theorem 4.1. If {f{y;9) -.9 G G} is a complete family, then a necessary 
and sufficient condition for LIG is that the CDM is CAR. 
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That CAR implies LIG is immediate. For the converse, if the CDM sat- 
isfies (4.1), there exists w{z]ip) such that 



/(y; G){k{z, y; ip) - w{z; ^)} dy = 0. 



We now need a straightforward extension of Lemma 3.1, that if f{y;0) is 
complete, then so is the conditional distribution of y given y (z z (the proof 
follows lines similar to the proof of Lemma 3.1 in the Appendix). This implies 
that K{z,y\'4^) =w{z;ip) and hence the CDM is CAR. 

The following example shows the necessity of model completeness for the 
equivalence of CAR and LIG when the coarsening variable G is a continuous 
random variable. 

Example 4.1. Let Y = (Yi,l2)"^ be the logarithms of two life times, 
assumed to follow the (incomplete) bivariate normal distribution in Exam- 
ple 3.1. Suppose that Yi is always observed but Y2 suffers from censoring, 
with G the corresponding (log) censoring time in a competing risks frame- 
work. The coarsening function is 



Z{y,g) 



\{yi} X ig,oo), 



if 5'>y2, 
if 5 < 2/2- 



Suppose further that {Y, G) are jointly Gaussian 

1 1/2 

1/2 1 1/2 

1/2 1 




^V's: 



For this model we find 
and 



K(z,y;V') 



{j/2,Oo) 



g 



ff + (1/3)^1- (2/3)^2 -V' 

v/273 

(1/3)2/1 - (2/3)y2 - V' 



/273 



g + (l/3)yi-(2/3)j/2-V' 

v/273 



dg, 
if ^ = {y}, 

a z = {yi} X (5^,00), 



where (/>(•) is the standard normal density function. Clearly, k(z, y;i/;) does 
not take the same value for all y € z for each value of "0, and so the CDM is 
not CAR. However, it is LIG, because for an observation z° = {yf} x {g°, cx)), 



f{yt,y2;9)Kiz°,y-ij)dy 
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(9°,oo) V \/2/3 / 



J(g°,(X>) 

as the conditional density of 2/2 given yi is independent of 9. 

5. Remarks. 

Remark 5.1. In missing data analysis we will usually assume that the 
data arise from n i.i.d. realizations from the joint distribution of (Y, R) 
in (2.1). If the distribution of Y is complete, asking whether the MDM 
affects inference (in the sense of LIG) is then equivalent to asking whether 
R depends on unobserved components of Y. In general, however, MAR is a 
stronger requirement than LIG. In Example 3.3, for instance, Y itself takes 
the form of an i.i.d. sample, but the components of R may be dependent. Now 
R can depend in a arbitrary way on ancillary statistics without upsetting 
inference about 6. 

Remark 5.2. Many familiar statistical models used in practice involve 
replication and i.i.d. residuals and are not complete, such as Example 3.3. 
In normal linear models more generally, ignorable MDMs can still depend 
on the standardized sample residuals. 

Remark 5.3. When covariates, say X, are involved in incomplete data 
analysis, we may wish to model conditionally on X, and hence the equiva- 
lence between LIG and MAR requires the completeness of the conditional 
model f{y\x;6) for almost all x G A". If X is fully observable, then Theo- 
rem 3.1 still holds. However, if X may also be missing, the equivalence of 
MAR and LIG requires more strongly that the joint density of {Y, X) belong 
to a complete parameter family. This situation has already been included in 
the above discussion, since some components of Y can be treated as covari- 
ates. However, caution must be taken for the model parameterization, as in 
general the parameterization for the joint distribution of (Y, X) is distinct 
from that for the conditional distribution of y on X. 

Remark 5.4. A special case occurs where y is a scalar random variable 
and no covariates are involved in the model. Here r is just or 1, and 
MAR requires that f{0\y;tp) is independent of y. However, then f{l\y;^) = 
1 ~ /(OI2/; V') must be independent of y too, and so Y and R are statistically 
independent in the usual sense. This is the missing completely at random 
(MCAR) condition [Rubin (1976)]. So in this special case the conclusion of 
Theorem 3.1 is that for complete families 

LIG ^ MAR ^ MCAR. 
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APPENDIX 

Proof of Lemma 3.1. Suppose that fi{y^^^;4'i{0)) is not complete. 
Then there exists t{y^^>) ^ such that 

j t{y^^^)h{y^^^;(i)i{e)}dy^^^ =0 V^gG (or, equivalently, V</>i G $i). 

Then 

' t{y^^^)f{y;9)dy 

■ t(y«)/i(y«; ,/.i) I /2|i{y(2) |y(i\ 02(^; y^'))} ^y(') dyW 

= V^gG, 

contradicting the completeness of f{y',0). Hence (a) is established. 

Now suppose that (b) does not hold. Then there exists some A C y^^' 
with nonzero probability under the marginal distribution of Y^^' , such that 
for any y^^' G A the family of conditional densities f2\i{y \y ,4'2{9', y )}, 
G 0, is not complete. There must then exist some function w{y^^' ,y^'^') ^ 
defined for y^^^ G A and y'^'^^ G 3^^^^ such that 

' w{y^'\y^^^)hii{y^'^\y^'\MO-J'^)}dy^''^ =0 V^gB. 

Now define 



0, otherwise. 



Clearly t{y) / 0, but 
t{y)f{y;e)dy 



f{y^'^;MO)) wiy^'\y^^^)h^Ay^'^\y('\UO,y^'^)}dy^^Uy^'^ 

= V^gG. 
This again contradicts the completeness of f{y;9). D 

Acknowledgments. We are grateful to the referees for their helpful com- 
ments on an earlier draft of this paper. 
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